Welcome to the SHARP Multi-omics Workshop. The goal of this workshop is to explore statistical methods for the analysis of multi-omic (or multi-view or multi-layer) data in observational studies. From this perspective, many population based or observational studies supplement a primary goal of investigating a risk factor on an outcome with additional omic data to better characterize the risk factors (e.g. germline genetics, exposomics), provide measurements for intermediate variables (e.g. transcriptomics, proteomics, metabolomics, and the microbiome), and/or to define a specific outcome of interest such as a single or multiple biomarkers. While all omic measurements often share a ‘high-dimensional” aspect, the different omic ‘dimensions’ can vary extensively in their scale of measurement, correlation structure, and strength and proportion of associations. In this context, the investigator is often confronted with an analytic decision between simplicity and complexity. Simple approaches often treat sets of variables in a pairwise independent manner sacrificing joint evaluation for benefits in interpretability. Complex methods often model joint correlation structures, but can sacrifice ease of interpretation.
Conceptually, multi-omic data can be integrated following several philosophical approaches as summerized in Picard et al. 2021:
1. Early Integration
2. Mixed Integration
3. Intermediate Integration
4. Late Integration
Dimensional reduction Within each of these type of approaches there is often the need for reducing the number of variables for analysis for computational efficiency, reducing the statistical noise, or to identify underlying latent structures/clusters that characterize patterns in the variables. Such reduction techniques generally fall into two main types of approaches.
Since most observational studies use association analysis as the bedrock for inference, in this workshop, we will build from the basic association framework and discuss extensions for integrated multi-omic analysis - always focusing on how integration strategies can by used to then investigate the subsequent role of the omic layer or specific feature on a outcome of interest. Accordingly the workshop focuses on expanding or integrating multi-omic data into an association frameworks.
For example, integrated analysis that utilizes a general mediation framework and ideas of dimension reduction are illustrated in the above figure, with each element of the grid indicating a potential analysis approach. For example, in “Early Integration with High Dimensional Data” (box A) the multiple omics layers are concatenated into a single omics matrix. Then, within a high dimensional mediation framework utilizing feature selection, features from all layers are selected accounting for each omic layer or type within a single mediation model. As an alternative, “Late Integration with High Dimensional Data” (box C), represents an approach that models each omic layer with a separate high dimensional mediation model for feature selection. Results from each layer can then be aggregated or evaluated in a post-hoc integrated analysis or interpretation. Alternative approaches can also be implemented that utilize feature extraction or clustering in concert with either “early” or “late” integration. For example, in “Early Integration with Latent Factors” (box D), the multiple omics layers are first concatenated into a single omics matrix and then a feature extraction/clustering/latent estimation procedure is performed on all features from all omics layers. Resulting clusters are then used in downstream mediation analysis for inference for associaiton to the outcome. Similarly, in “Late Integration with Laten Factors” (box F), the feature extraction/clustering/latent estimation is first performed on each omic layer followed by downstream mediation analysis. Each omic layer is treated independently and results fro each analysis are integrated in a post hoc framework.
To better understand the elements of each type of approach, for the workshop we will discuss the following:
A. Polygenic models and the use of genetic summary statistics data: As an extension to GWAS studies, these analysis techniques look to combine data into a single risk score (polygenic risk) or use genetic summary statistics from 1) the association of SNPs to an outcome; and 2) from the association of SNPs to a intermediate (often high dimensional omic data) to then test the association of the intermediate to the outcome.
B. Interaction analysis: Genomewide interaction analysis that focus often on a single risk factor and how it interacts with genomewide SNP data.
C. Clustering: With omic data clustering often serves as a key analytic technique within the analysis pipeline. This includes: 1) an initial step of dimension reduction or exploration of a single omic layer or multiple omic layers for downstream association analyses; or 2) the post-processing of high dimensional results from pairwise association analyses of omic data.
D. Mediation: To remain connected to the original biological hypothesis that often guides a study, mediation analysis strives to link the relationships between three sets of variables: 1) the risk factors; 2) the mediators or intermediates; and 3) the outcome. Omic data can be measured for each type of variable (most often on the risk factors and/or the intermediates) and high dimensional mediation techniques (including the incorproation of clustering or latent estimation) can be used for analysis.
Overall, we focus on statistical analyses for association testing with multi-omic data. We will not focus on the “lab-based” methods and techniques for measuring each type of omic data set or the omic-specific quality control or processing required and crucial for successful evaluation and use of omic data. We feel that there are ample training opportunities available that describe the details of these analyses for each type of omic data.
To facilitate the sessions and topics covered during the workshop we have created a pre-workshop lab. The idea of this pre-workshop lab is to provide a self guided tour to familiarize you with the data and basic statistical analyses that will serve as the foundation for content presented in the SHARP Multi-omics Workshop.
Each section consists of a R markdown file (.Rmd) and an .html file. The html file can be opened via a web browser and provides a formatted version to go through the presented material. At each stage, code can be revealed by clicking on the “code” button. As an alternative, the .Rmd files can be opened within R Studio and each code chunk can be run to explore the analysis in detail. In addition, the .Rmd files (include this file “PreworkshopLab.Rmd” can be “knitted” to create the html file by clicking the “knit” button in Rstudio).
The content in this pre-workshop will be discussed within the first session of the workshop to provide more background and context.
This describes the data that will be used for many of the labs throughout the workshop. We also present an example on how to construct a MultiAssayExperiment object - an R object for storing multi-view or multiple omic data sets measured on the same individuals.
The data is from the Exposome Data Analysis Challenge (https://www.isglobal.org/-/exposome-data-analysis-challenge). The Exposome dataset represents a real case scenario of an exposome dataset (based on the HELIX project database) with multiple correlated variables (N>100 exposure variables) arising from general and personal environments at different time points, biological molecular data (multi-omics: DNA methylation, gene expression, proteins, metabolomics, exposome) and multiple clinical phenotypes. The population is drawn from a multi-center study which will results in one of the main confounding structures in the dataset.
In addition, for the SHARP Multiomics Workshop, we simulated a germline genetics example dataset.
The HELIX study represents a collaborative project across six
established and ongoing longitudinal population-based birth cohort
studies in six European countries (France, Greece, Lithuania, Norway,
Spain, and the United Kingdom). HELIX used a multilevel study design
with the entire study population totaling 31,472 mother–child pairs,
recruited during pregnancy, in the six existing cohorts (first level); a
subcohort of 1301 mother-child pairs where biomarkers, omics signatures
and child health outcomes were measured at age 6-11 years (second
level); and repeat-sampling panel studies with around 150 children and
150 pregnant women aimed at collecting personal exposure data (third
level). For more details on the study design see Vrijheid, Slama, et
al. EHP 2014. see https://www.projecthelix.eu/index.php/es/data-inventory
for more information regarding the study.
load(paste0(work.dir, "/Data/exposome.RData"))
load(paste0(work.dir, "/Data/proteome.RData"))
load(paste0(work.dir, "/Data/genome.RData"))
load(paste0(work.dir, "/Data/metabol_serum.RData"))
load(paste0(work.dir, "/Data/metabol_urine.RData"))
outdoor.exposures <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Outdoor exposures"]))] %>%
column_to_rownames("ID") %>%
t() %>%
DataFrame()
indoor.air <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Indoor air"]))] %>%
column_to_rownames("ID") %>%
t() %>%
DataFrame()
lifestyles <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Lifestyles"]))] %>%
column_to_rownames("ID") %>%
t() %>%
DataFrame()
chemicals <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Chemicals"]))] %>%
column_to_rownames("ID") %>%
t() %>%
DataFrame()
covariates <- covariates %>%
column_to_rownames("ID") %>%
t() %>%
DataFrame()
phenotype <- phenotype %>% as.data.frame() # use as ColData for MultiAssayExperiment format
row.names(phenotype) <- paste0("X", phenotype$ID)
proteome.d <- proteome@assayData$exprs %>% DataFrame()
proteome.cov <- proteome@phenoData@data
proteome.cov <- proteome.cov[stats::complete.cases(proteome.cov),] %>% t() %>% DataFrame()
metabol_urine.d <- metabol_urine@assayData$exprs %>% DataFrame()
metabol_urine.cov <- metabol_urine@phenoData@data
metabol_urine.cov <- metabol_urine.cov[stats::complete.cases(metabol_urine.cov),] %>% t() %>% DataFrame()
metabol_serum.d <- metabol_serum@assayData$exprs %>% DataFrame()
metabol_serum.cov <- metabol_serum@phenoData@data
metabol_serum.cov <- metabol_serum.cov[stats::complete.cases(metabol_serum.cov),] %>% t() %>% DataFrame()
# note that we do not include the gene expression nor the methylation data in the MultiAssayExperiment object as they are large. We also don't recommend storing genomewide data in this format. However, we include a small (e.g. 1000 SNPs) "genome" germline genetics data as an example.
helix_ma <- MultiAssayExperiment(
experiments= ExperimentList("outdoor.exposures"=outdoor.exposures,
"indoor.air"=indoor.air,
"lifestyles"=lifestyles,
"exposome"=chemicals,
"covariates"=covariates,
"proteome"=proteome.d,
"proteome.cov"=proteome.cov,
"metabol_urine"=metabol_urine.d,
"metabol_urine.cov"=metabol_urine.cov,
"metabol_serum"=metabol_serum.d,
"metabol_serum.cov"=metabol_serum.cov,
"genome"=G),
colData = phenotype)
# clean up after creating MultiAssayExperiment data object
rm(outdoor.exposures)
rm(indoor.air)
rm(lifestyles)
rm(chemicals)
rm(covariates)
rm(proteome.d)
rm(proteome.cov)
rm(metabol_urine.d)
rm(metabol_urine.cov)
rm(metabol_serum.d)
rm(metabol_serum.cov)
rm(G)
#save(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData")) # code to save if needed
kable(codebook, align="c")
| variable_name | domain | family | subfamily | period | location | period_postnatal | description | var_type | transformation | labels | labelsshort | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| h_abs_ratio_preg_Log | h_abs_ratio_preg_Log | Outdoor exposures | Air Pollution | PMAbsorbance | Pregnancy | Home | NA | abs value (extrapolated back in time using ratio method)duringpregnancy | numeric | Natural Logarithm | PMabs | PMabs |
| h_no2_ratio_preg_Log | h_no2_ratio_preg_Log | Outdoor exposures | Air Pollution | NO2 | Pregnancy | Home | NA | no2 value (extrapolated back in time using ratio method)during pregnancy | numeric | Natural Logarithm | NO2 | NO2 |
| h_pm10_ratio_preg_None | h_pm10_ratio_preg_None | Outdoor exposures | Air Pollution | PM10 | Pregnancy | Home | NA | pm10 value (extrapolated back in time using ratio method)duringpregnancy | numeric | None | PM10 | PM10 |
| h_pm25_ratio_preg_None | h_pm25_ratio_preg_None | Outdoor exposures | Air Pollution | PM2.5 | Pregnancy | Home | NA | pm25 value (extrapolated back in time using ratio method)duringpregnancy | numeric | None | PM2.5 | PM2.5 |
| hs_no2_dy_hs_h_Log | hs_no2_dy_hs_h_Log | Outdoor exposures | Air Pollution | NO2 | Postnatal | Home | Day before examination | no2 value (extrapolated back in time using ratio method)one day before hs test at home | numeric | Natural Logarithm | NO2(day) | NO2(day) |
| hs_no2_wk_hs_h_Log | hs_no2_wk_hs_h_Log | Outdoor exposures | Air Pollution | NO2 | Postnatal | Home | Week before examination | no2 value (extrapolated back in time using ratio method)one week before hs test at home | numeric | Natural Logarithm | NO2(week) | NO2(week) |
| hs_no2_yr_hs_h_Log | hs_no2_yr_hs_h_Log | Outdoor exposures | Air Pollution | NO2 | Postnatal | Home | Year before examination | no2 value (extrapolated back in time using ratio method)one year before hs test at home | numeric | Natural Logarithm | NO2(year) | NO2(year) |
| hs_pm10_dy_hs_h_None | hs_pm10_dy_hs_h_None | Outdoor exposures | Air Pollution | PM10 | Postnatal | Home | Day before examination | pm10 value (extrapolated back in time using ratio method)one day before hs test at home | numeric | None | PM10(day) | PM10(day) |
| hs_pm10_wk_hs_h_None | hs_pm10_wk_hs_h_None | Outdoor exposures | Air Pollution | PM10 | Postnatal | Home | Week before examination | pm10 value (extrapolated back in time using ratio method)one week before hs test at home | numeric | None | PM10(week) | PM10(week) |
| hs_pm10_yr_hs_h_None | hs_pm10_yr_hs_h_None | Outdoor exposures | Air Pollution | PM10 | Postnatal | Home | Year before examination | pm10 value (extrapolated back in time using ratio method)one year before hs test at home | numeric | None | PM10(year) | PM10(year) |
| hs_pm25_dy_hs_h_None | hs_pm25_dy_hs_h_None | Outdoor exposures | Air Pollution | PM2.5 | Postnatal | Home | Day before examination | pm25 value (extrapolated back in time using ratio method)one day before hs test at home | numeric | None | PM2.5(day) | PM2.5(day) |
| hs_pm25_wk_hs_h_None | hs_pm25_wk_hs_h_None | Outdoor exposures | Air Pollution | PM2.5 | Postnatal | Home | Week before examination | pm25 value (extrapolated back in time using ratio method)one week before hs test at home | numeric | None | PM2.5(week) | PM2.5(week) |
| hs_pm25_yr_hs_h_None | hs_pm25_yr_hs_h_None | Outdoor exposures | Air Pollution | PM2.5 | Postnatal | Home | Year before examination | pm25 value (extrapolated back in time using ratio method)one year before hs test at home | numeric | None | PM2.5(year) | PM2.5(year) |
| hs_pm25abs_dy_hs_h_Log | hs_pm25abs_dy_hs_h_Log | Outdoor exposures | Air Pollution | PMAbsorbance | Postnatal | Home | Day before examination | pm25 absorbance value (extrapolated back in time using ratio method)one day before hs test at home | numeric | Natural Logarithm | PMabs(day) | PMabs(day) |
| hs_pm25abs_wk_hs_h_Log | hs_pm25abs_wk_hs_h_Log | Outdoor exposures | Air Pollution | PMAbsorbance | Postnatal | Home | Week before examination | pm25 absorbance value (extrapolated back in time using ratio method)one week before hs test at home | numeric | Natural Logarithm | PMabs(week) | PMabs(week) |
| hs_pm25abs_yr_hs_h_Log | hs_pm25abs_yr_hs_h_Log | Outdoor exposures | Air Pollution | PMAbsorbance | Postnatal | Home | Year before examination | pm25 absorbance value (extrapolated back in time using ratio method)one year before hs test at home | numeric | Natural Logarithm | PMabs(year) | PMabs(year) |
| h_accesslines300_preg_dic0 | h_accesslines300_preg_dic0 | Outdoor exposures | Built environment | Access | Pregnancy | Home | NA | Meters of public transport mode lines (only buses) inside each 300m buffer, divided by the buffer area in km2at pregnancy period | numeric | Dichotomous | Access_ lines | BPTLine |
| h_accesspoints300_preg_Log | h_accesspoints300_preg_Log | Outdoor exposures | Built environment | Access | Pregnancy | Home | NA | Number of bus public transport mode stops inside each 300m buffer, divided by the buffer area in km2at pregnancy period | numeric | Natural Logarithm | Access_stops | BPTStop |
| h_builtdens300_preg_Sqrt | h_builtdens300_preg_Sqrt | Outdoor exposures | Built environment | Building density | Pregnancy | Home | NA | Building density (m2 built/km2) within a buffers of 300mat pregnancy period | numeric | Square root | Building | BuildDens |
| h_connind300_preg_Sqrt | h_connind300_preg_Sqrt | Outdoor exposures | Built environment | Connectivity | Pregnancy | Home | NA | Connectivity density (number of intersections / km2) within a buffer of 300mat pregnancy period | numeric | Square root | Connectivity | Connec |
| h_fdensity300_preg_Log | h_fdensity300_preg_Log | Outdoor exposures | Built environment | Facility | Pregnancy | Home | NA | Number of facilities present divided by the area of the 300 meters buffer at pregnancy period | numeric | Natural Logarithm | Facility_dens | FacDens |
| h_frichness300_preg_None | h_frichness300_preg_None | Outdoor exposures | Built environment | Facility | Pregnancy | Home | NA | Number of different facility types present divided by the maximum potential number of facility types (at a 300m buffer)at pregnancy period | numeric | None | Facility_rich | FacRich |
| h_landuseshan300_preg_None | h_landuseshan300_preg_None | Outdoor exposures | Built environment | Land use | Pregnancy | Home | NA | Landuse Shannon’s Evenness Indexat pregnancy period | numeric | None | Land use | Land use |
| h_popdens_preg_Sqrt | h_popdens_preg_Sqrt | Outdoor exposures | Built environment | Population | Pregnancy | Home | NA | population densityat pregnancy period | numeric | Square root | Population | Pop |
| h_walkability_mean_preg_None | h_walkability_mean_preg_None | Outdoor exposures | Built environment | Walkability | Pregnancy | Home | NA | Walkability index (as mean of deciles of facility richness index, landuse shannon’s Evenness Index, population density, connectivity density)at pregnancy period | numeric | None | Walkability | Walkability |
| hs_accesslines300_h_dic0 | hs_accesslines300_h_dic0 | Outdoor exposures | Built environment | Access | Postnatal | Home | NA | Meters of public transport mode lines (only buses) inside each 300m buffer, divided by the buffer area in km2at home | numeric | Dichotomous | Access_ lines_home | BPTLineH |
| hs_accesspoints300_h_Log | hs_accesspoints300_h_Log | Outdoor exposures | Built environment | Access | Postnatal | Home | NA | Number of bus public transport mode stops inside each 300m buffer, divided by the buffer area in km2at home | numeric | Natural Logarithm | Access_stops_home | BPTStopH |
| hs_builtdens300_h_Sqrt | hs_builtdens300_h_Sqrt | Outdoor exposures | Built environment | Building density | Postnatal | Home | NA | Building density (m2 built/km2) within a buffers of 300mat home | numeric | Square root | Building_home | BuildH |
| hs_connind300_h_Log | hs_connind300_h_Log | Outdoor exposures | Built environment | Connectivity | Postnatal | Home | NA | Connectivity density (number of intersections / km2) within a buffer of 300mat home | numeric | Natural Logarithm | Connectivity | ConnH |
| hs_fdensity300_h_Log | hs_fdensity300_h_Log | Outdoor exposures | Built environment | Facility | Postnatal | Home | NA | Number of facilities present divided by the area of the 300 meters buffer at home | numeric | Natural Logarithm | Facility_dens | FacDenH |
| hs_landuseshan300_h_None | hs_landuseshan300_h_None | Outdoor exposures | Built environment | Land use | Postnatal | Home | NA | Landuse Shannon’s Evenness Indexat home | numeric | None | Land use | Land useH |
| hs_popdens_h_Sqrt | hs_popdens_h_Sqrt | Outdoor exposures | Built environment | Population | Postnatal | Home | NA | population densityat home | numeric | Square root | Population | Population |
| hs_walkability_mean_h_None | hs_walkability_mean_h_None | Outdoor exposures | Built environment | Walkability | Postnatal | Home | NA | walkability index (as mean of deciles of facility richness index, landuse shannon’s Evenness Index, population density, connectivity density)at home | numeric | None | Walkability | Walkability |
| hs_accesslines300_s_dic0 | hs_accesslines300_s_dic0 | Outdoor exposures | Built environment | Access | Postnatal | School | NA | Meters of public transport mode lines (only buses) inside each 300m buffer, divided by the buffer area in km2at school | numeric | Dichotomous | Access_ lines_school | BPTLineS |
| hs_accesspoints300_s_Log | hs_accesspoints300_s_Log | Outdoor exposures | Built environment | Access | Postnatal | School | NA | Number of bus public transport mode stops inside each 300m buffer, divided by the buffer area in km2at school | numeric | Natural Logarithm | Access_stops_school | BPTStopS |
| hs_builtdens300_s_Sqrt | hs_builtdens300_s_Sqrt | Outdoor exposures | Built environment | Building density | Postnatal | School | NA | Building density (m2 built/km2) within a buffers of 300mat school | numeric | Square root | Building_school_school | BuildS |
| hs_connind300_s_Log | hs_connind300_s_Log | Outdoor exposures | Built environment | Connectivity | Postnatal | School | NA | Connectivity density (number of intersections / km2) within a buffer of 300mat school | numeric | Natural Logarithm | Connectivity_school | ConnS |
| hs_fdensity300_s_Log | hs_fdensity300_s_Log | Outdoor exposures | Built environment | Facility | Postnatal | School | NA | Number of facilities present divided by the area of the 300 meters buffer at school | numeric | Natural Logarithm | Facility_dens_school | FacDenS |
| hs_landuseshan300_s_None | hs_landuseshan300_s_None | Outdoor exposures | Built environment | Land use | Postnatal | School | NA | Landuse Shannon’s Evenness Indexat school | numeric | None | Land use_school | Land useS |
| hs_popdens_s_Sqrt | hs_popdens_s_Sqrt | Outdoor exposures | Built environment | Population | Postnatal | School | NA | population densityat school | numeric | Square root | Population_school | PopS |
| h_Absorbance_Log | h_Absorbance_Log | Indoor air | Indoor air | PM | Postnatal | Home | NA | Concentration of absorbance | numeric | Natural Logarithm | PMabs in | PMabsIN |
| h_Benzene_Log | h_Benzene_Log | Indoor air | Indoor air | BTEX | Postnatal | Home | NA | Concentration of indoor Benzene | numeric | Natural Logarithm | Benzene in | Benzene |
| h_NO2_Log | h_NO2_Log | Indoor air | Indoor air | NO2 | Postnatal | Home | NA | Concentration of indoor NO2 | numeric | Natural Logarithm | NO2 in | NO2IN |
| h_PM_Log | h_PM_Log | Indoor air | Indoor air | PM | Postnatal | Home | NA | Concentration of particulate matter | numeric | Natural Logarithm | PM2.5 in | PM2.5IN |
| h_TEX_Log | h_TEX_Log | Indoor air | Indoor air | BTEX | Postnatal | Home | NA | Concentration of indoor BTEX (sum) | numeric | Natural Logarithm | BTEX in | BTEX |
| e3_alcpreg_yn_None | e3_alcpreg_yn_None | Lifestyles | Lifestyle | Prenatal Alcohol | Pregnancy | NA | NA | alcohol during pregnancy yes/no (0=none or <1/m for KANC) | factor | None | Alcohol | Alcohol |
| h_bfdur_Ter | h_bfdur_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Breastfeeding duration (weeks) | factor | Tertiles | Breastfeeding | Breastfeeding |
| h_cereal_preg_Ter | h_cereal_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | cereal comsumption during pregnancy (times/week) | factor | Tertiles | Cereals | Cereals |
| h_dairy_preg_Ter | h_dairy_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | dairy comsumption during pregnancy (times/week) | factor | Tertiles | Dairy | Dairy |
| h_fastfood_preg_Ter | h_fastfood_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | fast food comsumption during pregnancy (times/week) | factor | Tertiles | Fastfood | Fastfood |
| h_fish_preg_Ter | h_fish_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | fish comsumption during pregnancy (times/week) | factor | Tertiles | Fish | Fish |
| h_folic_t1_None | h_folic_t1_None | Lifestyles | Lifestyle | Folic acid consumption | Pregnancy | NA | NA | folic acid supplementation during pregnancy | factor | None | Folic acid | Folic acid |
| h_fruit_preg_Ter | h_fruit_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | fruit comsumption during pregnancy (times/week) | factor | Tertiles | Fruits | Fruits |
| h_legume_preg_Ter | h_legume_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | legume comsumption during pregnancy (times/week) | factor | Tertiles | Legumes | Legumes |
| h_meat_preg_Ter | h_meat_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | meat comsumption during pregnancy (times/week) | factor | Tertiles | Meat | Meat |
| h_pamod_t3_None | h_pamod_t3_None | Lifestyles | Lifestyle | Physical activity | Pregnancy | NA | NA | Walking and/or cycling acitivity during pregnancy (frequency) | factor | None | PAmoderate | PAModp |
| h_pavig_t3_None | h_pavig_t3_None | Lifestyles | Lifestyle | Physical activity | Pregnancy | NA | NA | Exercise or sport acitivity during pregnancy (frequency) | factor | None | PAvigorous | PAVig |
| h_veg_preg_Ter | h_veg_preg_Ter | Lifestyles | Lifestyle | Diet | Pregnancy | NA | NA | vegetables comsumption during pregnancy (times/week) | factor | Tertiles | Vegetables | Vegetables |
| hs_bakery_prod_Ter | hs_bakery_prod_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: bakery products (hs_cookies + hs_pastries) | factor | Tertiles | Bakery prod | BakeProd |
| hs_beverages_Ter | hs_beverages_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: beverages (hs_dietsoda+hs_soda) | factor | Tertiles | Soda | Soda |
| hs_break_cer_Ter | hs_break_cer_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: breakfast cereal (hs_sugarcer+hs_othcer) | factor | Tertiles | BF cereals | BFcereals |
| hs_caff_drink_Ter | hs_caff_drink_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Drinks a caffeinated or æenergy drink (eg coca-cola, diet-coke, redbull) | factor | Tertiles | Caffeine | Caffeine |
| hs_dairy_Ter | hs_dairy_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: dairy (hs_cheese + hs_milk + hs_yogurt+ hs_probiotic+ hs_desert) | factor | Tertiles | Dairy | Dairy |
| hs_fastfood_Ter | hs_fastfood_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Visits a fast food restaurant/take away | factor | Tertiles | Fastfood | Fastfood |
| hs_KIDMED_None | hs_KIDMED_None | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Sum of KIDMED indices, without index9 | numeric | None | KIDMED | KIDMED |
| hs_mvpa_prd_alt_None | hs_mvpa_prd_alt_None | Lifestyles | Lifestyle | Physical activity | Postnatal | NA | NA | Clean & Over-reporting of Moderate-to-Vigorous Physical Activity (min/day) | numeric | None | PA | PA |
| hs_org_food_Ter | hs_org_food_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Eats organic food | factor | Tertiles | Organicfood | Organicfood |
| hs_pet_cat_r2_None | hs_pet_cat_r2_None | Lifestyles | Lifestyle | Allergens | Postnatal | NA | NA | Do you have any cats that live mainly in your home? | factor | None | Cat_home | Cat |
| hs_pet_dog_r2_None | hs_pet_dog_r2_None | Lifestyles | Lifestyle | Allergens | Postnatal | NA | NA | Do you have any dogs that live mainly in your home? | factor | None | Dog_home | Dog |
| hs_pet_None | hs_pet_None | Lifestyles | Lifestyle | Allergens | Postnatal | NA | NA | Do you have any other pets that live mainly in your home? | factor | None | Other pets_home | Pets |
| hs_proc_meat_Ter | hs_proc_meat_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: processed meat (hs_coldmeat+hs_ham) | factor | Tertiles | Processed meat | ProcMeat |
| hs_readymade_Ter | hs_readymade_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Eats a æready-made supermarket meal | factor | Tertiles | Ready made food | ReadyFood |
| hs_sd_wk_None | hs_sd_wk_None | Lifestyles | Lifestyle | Physical activity | Postnatal | NA | NA | sedentary behaviour (min/day) | numeric | None | Sedentary | Sedentary |
| hs_total_bread_Ter | hs_total_bread_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: bread (hs_darkbread+hs_whbread) | factor | Tertiles | Bread | Bread |
| hs_total_cereal_Ter | hs_total_cereal_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: cereal (hs_darkbread + hs_whbread + hs_rice_pasta + hs_sugarcer + hs_othcer + hs_rusks) | factor | Tertiles | Cereals | Cereals |
| hs_total_fish_Ter | hs_total_fish_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: fish and seafood (hs_canfish+hs_oilyfish+hs_whfish+hs_seafood) | factor | Tertiles | Fish | Fish |
| hs_total_fruits_Ter | hs_total_fruits_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: fruits (hs_canfruit+hs_dryfruit+hs_freshjuice+hs_fruits) | factor | Tertiles | Fruits | Fruits |
| hs_total_lipids_Ter | hs_total_lipids_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: Added fat | factor | Tertiles | Diet fat | Diet fat |
| hs_total_meat_Ter | hs_total_meat_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: meat (hs_coldmeat+hs_ham+hs_poultry+hs_redmeat) | factor | Tertiles | Meat | Meat |
| hs_total_potatoes_Ter | hs_total_potatoes_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: potatoes (hs_frenchfries+hs_potatoes) | factor | Tertiles | Potatoes | Potatoes |
| hs_total_sweets_Ter | hs_total_sweets_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: sweets (hs_choco + hs_sweets + hs_sugar) | factor | Tertiles | Sweets | Sweets |
| hs_total_veg_Ter | hs_total_veg_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: vegetables (hs_cookveg+hs_rawveg) | factor | Tertiles | Vegetables | Vegetables |
| hs_total_yog_Ter | hs_total_yog_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: yogurt (hs_yogurt+hs_probiotic) | factor | Tertiles | Yogurt | Yogurt |
| hs_dif_hours_total_None | hs_dif_hours_total_None | Lifestyles | Lifestyle | Sleep | Postnatal | NA | NA | Total hours of sleep (mean weekdays and night) | numeric | None | Sleep | Sleep |
| hs_as_c_Log2 | hs_as_c_Log2 | Chemicals | Metals | As | Postnatal | NA | NA | Arsenic (As) in child | numeric | Logarithm base 2 | As | As |
| hs_as_m_Log2 | hs_as_m_Log2 | Chemicals | Metals | As | Pregnancy | NA | NA | Arsenic (As) in mother | numeric | Logarithm base 2 | As | As |
| hs_cd_c_Log2 | hs_cd_c_Log2 | Chemicals | Metals | Cd | Postnatal | NA | NA | Cadmium (Cd) in child | numeric | Logarithm base 2 | Cd | Cd |
| hs_cd_m_Log2 | hs_cd_m_Log2 | Chemicals | Metals | Cd | Pregnancy | NA | NA | Cadmium (Cd) in mother | numeric | Logarithm base 2 | Cd | Cd |
| hs_co_c_Log2 | hs_co_c_Log2 | Chemicals | Metals | Co | Postnatal | NA | NA | Cobalt (Co) in child | numeric | Logarithm base 2 | Co | Co |
| hs_co_m_Log2 | hs_co_m_Log2 | Chemicals | Metals | Co | Pregnancy | NA | NA | Cobalt (Co) in mother | numeric | Logarithm base 2 | Co | Co |
| hs_cs_c_Log2 | hs_cs_c_Log2 | Chemicals | Metals | Cs | Postnatal | NA | NA | Caesium (Cs) in child | numeric | Logarithm base 2 | Cs | Cs |
| hs_cs_m_Log2 | hs_cs_m_Log2 | Chemicals | Metals | Cs | Pregnancy | NA | NA | Caesium (Cs) in mother | numeric | Logarithm base 2 | Cs | Cs |
| hs_cu_c_Log2 | hs_cu_c_Log2 | Chemicals | Metals | Cu | Postnatal | NA | NA | Copper (Cu) in child | numeric | Logarithm base 2 | Cu | Cu |
| hs_cu_m_Log2 | hs_cu_m_Log2 | Chemicals | Metals | Cu | Pregnancy | NA | NA | Copper (Cu) in mother | numeric | Logarithm base 2 | Cu | Cu |
| hs_hg_c_Log2 | hs_hg_c_Log2 | Chemicals | Metals | Hg | Postnatal | NA | NA | Mercury (Hg) in child | numeric | Logarithm base 2 | Hg | Hg |
| hs_hg_m_Log2 | hs_hg_m_Log2 | Chemicals | Metals | Hg | Pregnancy | NA | NA | Mercury (Hg) in mother | numeric | Logarithm base 2 | Hg | Hg |
| hs_mn_c_Log2 | hs_mn_c_Log2 | Chemicals | Metals | Mn | Postnatal | NA | NA | Manganese (Mn) in child | numeric | Logarithm base 2 | Mn | Mn |
| hs_mn_m_Log2 | hs_mn_m_Log2 | Chemicals | Metals | Mn | Pregnancy | NA | NA | Manganese (Mn) in mother | numeric | Logarithm base 2 | Mn | Mn |
| hs_mo_c_Log2 | hs_mo_c_Log2 | Chemicals | Metals | Mo | Postnatal | NA | NA | Molybdenum (Mo) in child | numeric | Logarithm base 2 | Mo | Mo |
| hs_mo_m_Log2 | hs_mo_m_Log2 | Chemicals | Metals | Mo | Pregnancy | NA | NA | Molybdenum (Mo) in mother | numeric | Logarithm base 2 | Mo | Mo |
| hs_pb_c_Log2 | hs_pb_c_Log2 | Chemicals | Metals | Pb | Postnatal | NA | NA | Lead (Pb) in child | numeric | Logarithm base 2 | Pb | Pb |
| hs_pb_m_Log2 | hs_pb_m_Log2 | Chemicals | Metals | Pb | Pregnancy | NA | NA | Lead (Pb) in mother | numeric | Logarithm base 2 | Pb | Pb |
| hs_tl_cdich_None | hs_tl_cdich_None | Chemicals | Metals | Tl | Postnatal | NA | NA | Dichotomous variable of thallium (Tl) in child | factor | None | Tl | Tl |
| hs_tl_mdich_None | hs_tl_mdich_None | Chemicals | Metals | Tl | Pregnancy | NA | NA | Dichotomous variable of thallium (Tl) in mother | factor | None | Tl | Tl |
| h_humidity_preg_None | h_humidity_preg_None | Outdoor exposures | Meteorological | Humidity | Pregnancy | Home | NA | Humidity average during pregnancy | numeric | None | Hum. | Hum |
| h_pressure_preg_None | h_pressure_preg_None | Outdoor exposures | Meteorological | Pressure | Pregnancy | Home | NA | Pressure average during pregnancy | numeric | None | Pres. | Pres |
| h_temperature_preg_None | h_temperature_preg_None | Outdoor exposures | Meteorological | Temperature | Pregnancy | Home | NA | Temperature average during pregnancy | numeric | None | T | T |
| hs_hum_mt_hs_h_None | hs_hum_mt_hs_h_None | Outdoor exposures | Meteorological | Humidity | Postnatal | Home | Month before examination | Relative humidityone month before at home | numeric | None | Hum.(month) | Hum.(month) |
| hs_tm_mt_hs_h_None | hs_tm_mt_hs_h_None | Outdoor exposures | Meteorological | Temperature | Postnatal | Home | Month before examination | Mean temperatureone month before at home | numeric | None | T(month) | T(month) |
| hs_uvdvf_mt_hs_h_None | hs_uvdvf_mt_hs_h_None | Outdoor exposures | Meteorological | UV | Postnatal | Home | Month before examination | Vitamine-D UV dose per subjectone month before at home | numeric | None | UV(month) | UV(month) |
| hs_hum_dy_hs_h_None | hs_hum_dy_hs_h_None | Outdoor exposures | Meteorological | Humidity | Postnatal | Home | Day before examination | Relative humidityone day before at home | numeric | None | T(day) | T(day) |
| hs_hum_wk_hs_h_None | hs_hum_wk_hs_h_None | Outdoor exposures | Meteorological | Humidity | Postnatal | Home | Week before examination | Relative humidityone week before at home | numeric | None | Hum.(week) | Hum.(week) |
| hs_tm_dy_hs_h_None | hs_tm_dy_hs_h_None | Outdoor exposures | Meteorological | Temperature | Postnatal | Home | Day before examination | Mean temperatureone day before at home | numeric | None | T(day) | T(day) |
| hs_tm_wk_hs_h_None | hs_tm_wk_hs_h_None | Outdoor exposures | Meteorological | Temperature | Postnatal | Home | Week before examination | Mean temperatureone week before at home | numeric | None | T(week) | T(week) |
| hs_uvdvf_dy_hs_h_None | hs_uvdvf_dy_hs_h_None | Outdoor exposures | Meteorological | UV | Postnatal | Home | Day before examination | Vitamin-D UV dose per subjectone day before at home | numeric | None | UV(day) | UV(day) |
| hs_uvdvf_wk_hs_h_None | hs_uvdvf_wk_hs_h_None | Outdoor exposures | Meteorological | UV | Postnatal | Home | Week before examination | Vitamin-D UV dose per subjectone week before at home | numeric | None | UV(week) | UV(week) |
| hs_blueyn300_s_None | hs_blueyn300_s_None | Outdoor exposures | Natural Spaces | Blue | Postnatal | School | NA | Is there a bluespace in a distance of 300m?at school | factor | None | Blue_school | BlueS |
| h_blueyn300_preg_None | h_blueyn300_preg_None | Outdoor exposures | Natural Spaces | Blue | Pregnancy | Home | NA | Is there a bluespace in a distance of 300m?at pregnancy period | factor | None | Blue space | Blue |
| h_greenyn300_preg_None | h_greenyn300_preg_None | Outdoor exposures | Natural Spaces | Green | Pregnancy | Home | NA | Is there a greenspace in a distance of 300m?at pregnancy period | factor | None | Green space | Green |
| h_ndvi100_preg_None | h_ndvi100_preg_None | Outdoor exposures | Natural Spaces | NDVI | Pregnancy | Home | NA | Average of NDVI values within a buffer of 100mat pregnancy period | numeric | None | NDVI | NDVI |
| hs_greenyn300_s_None | hs_greenyn300_s_None | Outdoor exposures | Natural Spaces | Green | Postnatal | School | NA | Is there a greenspace in a distance of 300m?at school | factor | None | Green_school | GreenS |
| hs_blueyn300_h_None | hs_blueyn300_h_None | Outdoor exposures | Natural Spaces | Blue | Postnatal | Home | NA | Is there a bluespace in a distance of 300m?at home | factor | None | Blue_home | BlueH |
| hs_greenyn300_h_None | hs_greenyn300_h_None | Outdoor exposures | Natural Spaces | Green | Postnatal | Home | NA | Is there a greenspace in a distance of 300m?at home | factor | None | Green_home | GreenH |
| hs_ndvi100_h_None | hs_ndvi100_h_None | Outdoor exposures | Natural Spaces | NDVI | Postnatal | Home | NA | Average of NDVI values within a buffer of 100mat home | numeric | None | NDVI_home | NDVIH |
| hs_ndvi100_s_None | hs_ndvi100_s_None | Outdoor exposures | Natural Spaces | NDVI | Postnatal | School | NA | Average of NDVI values within a buffer of 100m at school | numeric | None | NDVI_school | NDVIS |
| h_lden_cat_preg_None | h_lden_cat_preg_None | Outdoor exposures | Noise | Noise | Pregnancy | Home | NA | Categorized lden (day, evening, night)at pregnancy period | numeric | None | Traffic noise_24h | Noise |
| hs_ln_cat_h_None | hs_ln_cat_h_None | Outdoor exposures | Noise | Noise | Postnatal | Home | NA | Categorized ln (night)at home | factor | None | Traffic noise_night | NoiseNight |
| hs_lden_cat_s_None | hs_lden_cat_s_None | Outdoor exposures | Noise | Noise | Postnatal | School | NA | Categorized lden (one day, evening, night)at school | factor | None | Traffic noise_24h school | NoiseS |
| hs_dde_cadj_Log2 | hs_dde_cadj_Log2 | Chemicals | Organochlorines | DDE | Postnatal | NA | NA | Dichlorodiphenyldichloroethylene (DDE) in child adjusted for lipids | numeric | Logarithm base 2 | DDE | DDE |
| hs_dde_madj_Log2 | hs_dde_madj_Log2 | Chemicals | Organochlorines | DDE | Pregnancy | NA | NA | Dichlorodiphenyldichloroethylene (DDE) in mother adjusted for lipids | numeric | Logarithm base 2 | DDE | DDE |
| hs_ddt_cadj_Log2 | hs_ddt_cadj_Log2 | Chemicals | Organochlorines | DDT | Postnatal | NA | NA | Dichlorodiphenyltrichloroethane (DDT) in child adjusted for lipids | numeric | Logarithm base 2 | DDT | DDT |
| hs_ddt_madj_Log2 | hs_ddt_madj_Log2 | Chemicals | Organochlorines | DDT | Pregnancy | NA | NA | Dichlorodiphenyltrichloroethane (DDT) in mother adjusted for lipids | numeric | Logarithm base 2 | DDT | DDT |
| hs_hcb_cadj_Log2 | hs_hcb_cadj_Log2 | Chemicals | Organochlorines | HCB | Postnatal | NA | NA | Hexachlorobenzene (HCB) in child adjusted for lipids | numeric | Logarithm base 2 | HCB | HCB |
| hs_hcb_madj_Log2 | hs_hcb_madj_Log2 | Chemicals | Organochlorines | HCB | Pregnancy | NA | NA | Hexachlorobenzene (HCB) in mother adjusted for lipids | numeric | Logarithm base 2 | HCB | HCB |
| hs_pcb118_cadj_Log2 | hs_pcb118_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl -118 (PCB-118) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 118 | PCB118 |
| hs_pcb118_madj_Log2 | hs_pcb118_madj_Log2 | Chemicals | Organochlorines | PCBs | Pregnancy | NA | NA | Polychlorinated biphenyl-118 (PCB-118) in mother adjusted for lipids | numeric | Logarithm base 2 | PCB 118 | PCB118 |
| hs_pcb138_cadj_Log2 | hs_pcb138_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-138 (PCB-138) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 138 | PCB138 |
| hs_pcb138_madj_Log2 | hs_pcb138_madj_Log2 | Chemicals | Organochlorines | PCBs | Pregnancy | NA | NA | Polychlorinated biphenyl-138 (PCB-138) in mother adjusted for lipids | numeric | Logarithm base 2 | PCB 138 | PCB138 |
| hs_pcb153_cadj_Log2 | hs_pcb153_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-153 (PCB-153) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 153 | PCB153 |
| hs_pcb153_madj_Log2 | hs_pcb153_madj_Log2 | Chemicals | Organochlorines | PCBs | Pregnancy | NA | NA | Polychlorinated biphenyl-153 (PCB-153) in mother adjusted for lipids | numeric | Logarithm base 2 | PCB 153 | PCB153 |
| hs_pcb170_cadj_Log2 | hs_pcb170_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-170 (PCB-170) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 170 | PCB170 |
| hs_pcb170_madj_Log2 | hs_pcb170_madj_Log2 | Chemicals | Organochlorines | PCBs | Pregnancy | NA | NA | Polychlorinated biphenyl-170 (PCB-170) in mother adjusted for lipids | numeric | Logarithm base 2 | PCB 170 | PCB170 |
| hs_pcb180_cadj_Log2 | hs_pcb180_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-180 (PCB-180) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 180 | PCB180 |
| hs_pcb180_madj_Log2 | hs_pcb180_madj_Log2 | Chemicals | Organochlorines | PCBs | Pregnancy | NA | NA | Polychlorinated biphenyl-180 (PCB-180) in mother adjusted for lipids | numeric | Logarithm base 2 | PCB 180 | PCB180 |
| hs_sumPCBs5_cadj_Log2 | hs_sumPCBs5_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Sum of PCBs in child adjusted for lipids (4 cohorts) | numeric | Logarithm base 2 | PCBs | SumPCB |
| hs_sumPCBs5_madj_Log2 | hs_sumPCBs5_madj_Log2 | Chemicals | Organochlorines | PCBs | Pregnancy | NA | NA | Sum of PCBs in mother adjusted for lipids (5 cohorts) | numeric | Logarithm base 2 | PCBs | SumPCB |
| hs_dep_cadj_Log2 | hs_dep_cadj_Log2 | Chemicals | Organophosphate pesticides | DEP | Postnatal | NA | NA | Diethyl phosphate (DEP) in child adjusted for creatinine | numeric | Logarithm base 2 | DEP | DEP |
| hs_dep_madj_Log2 | hs_dep_madj_Log2 | Chemicals | Organophosphate pesticides | DEP | Pregnancy | NA | NA | Diethyl phosphate (DEP) in mother adjusted for creatinine | numeric | Logarithm base 2 | DEP | DEP |
| hs_detp_cadj_Log2 | hs_detp_cadj_Log2 | Chemicals | Organophosphate pesticides | DETP | Postnatal | NA | NA | Diethyl thiophosphate (DETP) in child adjusted for creatinine | numeric | Logarithm base 2 | DETP | DETP |
| hs_detp_madj_Log2 | hs_detp_madj_Log2 | Chemicals | Organophosphate pesticides | DETP | Pregnancy | NA | NA | Diethyl thiophosphate (DETP) in mother adjusted for creatinine | numeric | Logarithm base 2 | DETP | DETP |
| hs_dmdtp_cdich_None | hs_dmdtp_cdich_None | Chemicals | Organophosphate pesticides | DMDTP | Postnatal | NA | NA | Dichotomous variable of dimethyl dithiophosphate (DMDTP) in child | factor | None | DMDTP | DMDTP |
| hs_dmp_cadj_Log2 | hs_dmp_cadj_Log2 | Chemicals | Organophosphate pesticides | DMP | Postnatal | NA | NA | Dimethyl phosphate (DMP) in child adjusted for creatinine | numeric | Logarithm base 2 | DMP | DMP |
| hs_dmp_madj_Log2 | hs_dmp_madj_Log2 | Chemicals | Organophosphate pesticides | DMP | Pregnancy | NA | NA | Dimethyl phosphate (DMP) in mother adjusted for creatinine | numeric | Logarithm base 2 | DMP | DMP |
| hs_dmtp_cadj_Log2 | hs_dmtp_cadj_Log2 | Chemicals | Organophosphate pesticides | DMTP | Postnatal | NA | NA | Dimethyl thiophosphate (DMTP) in child adjusted for creatinine | numeric | Logarithm base 2 | DMDTP | DMTP |
| hs_dmtp_madj_Log2 | hs_dmtp_madj_Log2 | Chemicals | Organophosphate pesticides | DMTP | Pregnancy | NA | NA | Dimethyl thiophosphate (DMTP) in child adjusted for creatinine | numeric | Logarithm base 2 | DMDTP | DMTP |
| hs_pbde153_cadj_Log2 | hs_pbde153_cadj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE153 | Postnatal | NA | NA | Polybrominated diphenyl ether-153 (PBDE-153) in child adjusted for lipids | numeric | Logarithm base 2 | PBDE 153 | PBDE153 |
| hs_pbde153_madj_Log2 | hs_pbde153_madj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE153 | Pregnancy | NA | NA | Polybrominated diphenyl ether-153 (PBDE-153) in mother adjusted for lipids | numeric | Logarithm base 2 | PBDE 153 | PBDE153 |
| hs_pbde47_cadj_Log2 | hs_pbde47_cadj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE47 | Postnatal | NA | NA | Polybrominated diphenyl ether-47 (PBDE-47) in child adjusted for lipids | numeric | Logarithm base 2 | PBDE 47 | PBDE47 |
| hs_pbde47_madj_Log2 | hs_pbde47_madj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE47 | Pregnancy | NA | NA | Polybrominated diphenyl ether-47 (PBDE-47) in mother adjusted for lipids | numeric | Logarithm base 2 | PBDE 47 | PBDE47 |
| hs_pfhxs_c_Log2 | hs_pfhxs_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFHXS | Postnatal | NA | NA | Perfluorohexane sulfonate (PFHXS) in child | numeric | Logarithm base 2 | PFHXS | PFHXS |
| hs_pfhxs_m_Log2 | hs_pfhxs_m_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFHXS | Pregnancy | NA | NA | Perfluorohexane sulfonate (PFHXS) in mother | numeric | Logarithm base 2 | PFHXS | PFHXS |
| hs_pfna_c_Log2 | hs_pfna_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFNA | Postnatal | NA | NA | Perfluorononanoate (PFNA) in child | numeric | Logarithm base 2 | PFNA | PFNA |
| hs_pfna_m_Log2 | hs_pfna_m_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFNA | Pregnancy | NA | NA | Perfluorononanoate (PFNA) in mother | numeric | Logarithm base 2 | PFNA | PFNA |
| hs_pfoa_c_Log2 | hs_pfoa_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOA | Postnatal | NA | NA | Perfluorooctanoate (PFOA) in child | numeric | Logarithm base 2 | PFOA | PFOA |
| hs_pfoa_m_Log2 | hs_pfoa_m_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOA | Pregnancy | NA | NA | Perfluorooctanoate (PFOA) in mother | numeric | Logarithm base 2 | PFOA | PFOA |
| hs_pfos_c_Log2 | hs_pfos_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOS | Postnatal | NA | NA | Perfluorooctane sulfonate (PFOS) in child | numeric | Logarithm base 2 | PFOS | PFOS |
| hs_pfos_m_Log2 | hs_pfos_m_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOS | Pregnancy | NA | NA | Perfluorooctane sulfonate (PFOS) in mother | numeric | Logarithm base 2 | PFOS | PFOS |
| hs_pfunda_c_Log2 | hs_pfunda_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFUNDA | Postnatal | NA | NA | Perfluoroundecanoate (PFUNDA) in child | numeric | Logarithm base 2 | PFUNDA | PFUNDA |
| hs_pfunda_m_Log2 | hs_pfunda_m_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFUNDA | Pregnancy | NA | NA | Perfluoroundecanoate (PFUNDA) in mother | numeric | Logarithm base 2 | PFUNDA | PFUNDA |
| hs_bpa_cadj_Log2 | hs_bpa_cadj_Log2 | Chemicals | Phenols | BPA | Postnatal | NA | NA | Bisphenol A (BPA) in child adjusted for creatinine | numeric | Logarithm base 2 | BPA | BPA |
| hs_bpa_madj_Log2 | hs_bpa_madj_Log2 | Chemicals | Phenols | BPA | Pregnancy | NA | NA | Bisphenol A (BPA) in mother adjusted for creatinine | numeric | Logarithm base 2 | BPA | BPA |
| hs_bupa_cadj_Log2 | hs_bupa_cadj_Log2 | Chemicals | Phenols | BUPA | Postnatal | NA | NA | N-Butyl paraben (BUPA) in child adjusted for creatinine | numeric | Logarithm base 2 | BUPA | BUPA |
| hs_bupa_madj_Log2 | hs_bupa_madj_Log2 | Chemicals | Phenols | BUPA | Pregnancy | NA | NA | N-Butyl paraben (BUPA) in mother adjusted for creatinine | numeric | Logarithm base 2 | BUPA | BUPA |
| hs_etpa_cadj_Log2 | hs_etpa_cadj_Log2 | Chemicals | Phenols | ETPA | Postnatal | NA | NA | Ethyl paraben (ETPA) in child adjusted for creatinine | numeric | Logarithm base 2 | ETPA | ETPA |
| hs_etpa_madj_Log2 | hs_etpa_madj_Log2 | Chemicals | Phenols | ETPA | Pregnancy | NA | NA | Ethyl paraben (ETPA) in mother adjusted for creatinine | numeric | Logarithm base 2 | ETPA | ETPA |
| hs_mepa_cadj_Log2 | hs_mepa_cadj_Log2 | Chemicals | Phenols | MEPA | Postnatal | NA | NA | Methyl paraben (MEPA) in child adjusted for creatinine | numeric | Logarithm base 2 | MEPA | MEPA |
| hs_mepa_madj_Log2 | hs_mepa_madj_Log2 | Chemicals | Phenols | MEPA | Pregnancy | NA | NA | Methyl paraben (MEPA) in mother adjusted for creatinine | numeric | Logarithm base 2 | MEPA | MEPA |
| hs_oxbe_cadj_Log2 | hs_oxbe_cadj_Log2 | Chemicals | Phenols | OXBE | Postnatal | NA | NA | Oxybenzone (OXBE) in child adjusted for creatinine | numeric | Logarithm base 2 | OXBE | OXBE |
| hs_oxbe_madj_Log2 | hs_oxbe_madj_Log2 | Chemicals | Phenols | OXBE | Pregnancy | NA | NA | Oxybenzone (OXBE) in mother adjusted for creatinine | numeric | Logarithm base 2 | OXBE | OXBE |
| hs_prpa_cadj_Log2 | hs_prpa_cadj_Log2 | Chemicals | Phenols | PRPA | Postnatal | NA | NA | Propyl paraben (PRPA) in child adjusted for creatinine | numeric | Logarithm base 2 | PRPA | PRPA |
| hs_prpa_madj_Log2 | hs_prpa_madj_Log2 | Chemicals | Phenols | PRPA | Pregnancy | NA | NA | Propyl paraben (PRPA) in mother adjusted for creatinine | numeric | Logarithm base 2 | PRPA | PRPA |
| hs_trcs_cadj_Log2 | hs_trcs_cadj_Log2 | Chemicals | Phenols | TRCS | Postnatal | NA | NA | Triclosan (TRCS) in child adjusted for creatinine | numeric | Logarithm base 2 | TRCS | TRCS |
| hs_trcs_madj_Log2 | hs_trcs_madj_Log2 | Chemicals | Phenols | TRCS | Pregnancy | NA | NA | Triclosan (TRCS) in mother adjusted for creatinine | numeric | Logarithm base 2 | TRCS | TRCS |
| hs_mbzp_cadj_Log2 | hs_mbzp_cadj_Log2 | Chemicals | Phthalates | MBZP | Postnatal | NA | NA | Mono benzyl phthalate (MBzP) in child adjusted for creatinine | numeric | Logarithm base 2 | MBZP | MBZP |
| hs_mbzp_madj_Log2 | hs_mbzp_madj_Log2 | Chemicals | Phthalates | MBZP | Pregnancy | NA | NA | Mono benzyl phthalate (MBzP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MBZP | MBZP |
| hs_mecpp_cadj_Log2 | hs_mecpp_cadj_Log2 | Chemicals | Phthalates | MECPP | Postnatal | NA | NA | Mono-2-ethyl 5-carboxypentyl phthalate (MECPP) in child adjusted for creatinine | numeric | Logarithm base 2 | MECPP | MECPP |
| hs_mecpp_madj_Log2 | hs_mecpp_madj_Log2 | Chemicals | Phthalates | MECPP | Pregnancy | NA | NA | Mono-2-ethyl 5-carboxypentyl phthalate (MECPP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MECPP | MECPP |
| hs_mehhp_cadj_Log2 | hs_mehhp_cadj_Log2 | Chemicals | Phthalates | MEHHP | Postnatal | NA | NA | Mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEHHP | MEHHP |
| hs_mehhp_madj_Log2 | hs_mehhp_madj_Log2 | Chemicals | Phthalates | MEHHP | Pregnancy | NA | NA | Mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MEHHP | MEHHP |
| hs_mehp_cadj_Log2 | hs_mehp_cadj_Log2 | Chemicals | Phthalates | MEHP | Postnatal | NA | NA | Mono-2-ethylhexyl phthalate (MEHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEHP | MEHP |
| hs_mehp_madj_Log2 | hs_mehp_madj_Log2 | Chemicals | Phthalates | MEHP | Pregnancy | NA | NA | Mono-2-ethylhexyl phthalate (MEHP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MEHP | MEHP |
| hs_meohp_cadj_Log2 | hs_meohp_cadj_Log2 | Chemicals | Phthalates | MEOHP | Postnatal | NA | NA | Mono-2-ethyl-5-oxohexyl phthalate (MEOHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEOHP | MEOHP |
| hs_meohp_madj_Log2 | hs_meohp_madj_Log2 | Chemicals | Phthalates | MEOHP | Pregnancy | NA | NA | Mono-2-ethyl-5-oxohexyl phthalate (MEOHP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MEOHP | MEOHP |
| hs_mep_cadj_Log2 | hs_mep_cadj_Log2 | Chemicals | Phthalates | MEP | Postnatal | NA | NA | Monoethyl phthalate (MEP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEP | MEP |
| hs_mep_madj_Log2 | hs_mep_madj_Log2 | Chemicals | Phthalates | MEP | Pregnancy | NA | NA | Monoethyl phthalate (MEP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MEP | MEP |
| hs_mibp_cadj_Log2 | hs_mibp_cadj_Log2 | Chemicals | Phthalates | MIBP | Postnatal | NA | NA | Mono-iso-butyl phthalate (MiBP) in child adjusted for creatinine | numeric | Logarithm base 2 | MIBP | MIBP |
| hs_mibp_madj_Log2 | hs_mibp_madj_Log2 | Chemicals | Phthalates | MIBP | Pregnancy | NA | NA | Mono-iso-butyl phthalate (MiBP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MIBP | MIBP |
| hs_mnbp_cadj_Log2 | hs_mnbp_cadj_Log2 | Chemicals | Phthalates | MNBP | Postnatal | NA | NA | Mono-n-butyl phthalate (MnBP) in child adjusted for creatinine | numeric | Logarithm base 2 | MNBP | MNBP |
| hs_mnbp_madj_Log2 | hs_mnbp_madj_Log2 | Chemicals | Phthalates | MNBP | Pregnancy | NA | NA | Mono-n-butyl phthalate (MnBP) in mother adjusted for creatinine | numeric | Logarithm base 2 | MNBP | MNBP |
| hs_ohminp_cadj_Log2 | hs_ohminp_cadj_Log2 | Chemicals | Phthalates | OHMiNP | Postnatal | NA | NA | Mono-4-methyl-7-hydroxyoctyl phthalate (OHMiNP) in child adjusted for creatinine | numeric | Logarithm base 2 | OHMiNP | OHMiNP |
| hs_ohminp_madj_Log2 | hs_ohminp_madj_Log2 | Chemicals | Phthalates | OHMiNP | Pregnancy | NA | NA | Mono-4-methyl-7-hydroxyoctyl phthalate (OHMiNP) in mother adjusted for creatinine | numeric | Logarithm base 2 | OHMiNP | OHMiNP |
| hs_oxominp_cadj_Log2 | hs_oxominp_cadj_Log2 | Chemicals | Phthalates | OXOMINP | Postnatal | NA | NA | Mono-4-methyl-7-oxooctyl phthalate (OXOMiNP) in child adjusted for creatinine | numeric | Logarithm base 2 | OXOMINP | OXOMINP |
| hs_oxominp_madj_Log2 | hs_oxominp_madj_Log2 | Chemicals | Phthalates | OXOMINP | Pregnancy | NA | NA | Mono-4-methyl-7-oxooctyl phthalate (OXOMiNP) in mother adjusted for creatinine | numeric | Logarithm base 2 | OXOMINP | OXOMINP |
| hs_sumDEHP_cadj_Log2 | hs_sumDEHP_cadj_Log2 | Chemicals | Phthalates | DEHP | Postnatal | NA | NA | Sum of DEHP metabolites (µg/g) in child adjusted for creatinine | numeric | Logarithm base 2 | DEHP | SumDEHP |
| hs_sumDEHP_madj_Log2 | hs_sumDEHP_madj_Log2 | Chemicals | Phthalates | DEHP | Pregnancy | NA | NA | Sum of DEHP metabolites (µg/g) in mother adjusted for creatinine | numeric | Logarithm base 2 | DEHP | SumDEHP |
| FAS_cat_None | FAS_cat_None | Chemicals | Social and economic capital | Economic capital | Postnatal | NA | NA | Family affluence score | factor | None | Family affluence | FamAfl |
| hs_contactfam_3cat_num_None | hs_contactfam_3cat_num_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | scoial capital: family friends | factor | None | Social contact | SocCont |
| hs_hm_pers_None | hs_hm_pers_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | How many people live in your home? | numeric | None | House crowding | HouseCrow |
| hs_participation_3cat_None | hs_participation_3cat_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | social capital: structural | factor | None | Social participation | SocPartic |
| e3_asmokcigd_p_None | e3_asmokcigd_p_None | Chemicals | Tobacco Smoke | Tobacco Smoke | Pregnancy | NA | NA | maternal active Tobacco Smoke pregnancy mean nb cig/day | numeric | None | Cigarette | Cigarette |
| hs_cotinine_cdich_None | hs_cotinine_cdich_None | Chemicals | Tobacco Smoke | Cotinine | Postnatal | NA | NA | Dichotomous variable of cotinine in child | factor | None | Cotinine | Cotinine |
| hs_cotinine_mcat_None | hs_cotinine_mcat_None | Chemicals | Tobacco Smoke | Cotinine | Pregnancy | NA | NA | Categorical variable of cotinine in mother | factor | None | Cotinine | Cotinine |
| hs_globalexp2_None | hs_globalexp2_None | Chemicals | Tobacco Smoke | Tobacco Smoke | Postnatal | NA | NA | Global exposure of the child to ETS (2 categories) | factor | None | ETS | ETS |
| hs_smk_parents_None | hs_smk_parents_None | Chemicals | Tobacco Smoke | Tobacco Smoke | Postnatal | NA | NA | Tobacco Smoke status of parents (both) | factor | None | Smoking_parents | SmokPar |
| h_distinvnear1_preg_Log | h_distinvnear1_preg_Log | Outdoor exposures | Traffic | Traffic | Pregnancy | Home | NA | Inverse distance to nearest road at pregnancy period | numeric | Natural Logarithm | Distance road | DistRoad |
| h_trafload_preg_pow1over3 | h_trafload_preg_pow1over3 | Outdoor exposures | Traffic | Traffic | Pregnancy | Home | NA | Total traffic load of all roads in 100 m buffer at pregnancy period | numeric | None | Traffic_100m | Traffic |
| h_trafnear_preg_pow1over3 | h_trafnear_preg_pow1over3 | Outdoor exposures | Traffic | Traffic | Pregnancy | Home | NA | Traffic density on nearest road at pregnancy period | numeric | None | Traffic density | TrafDens |
| hs_trafload_h_pow1over3 | hs_trafload_h_pow1over3 | Outdoor exposures | Traffic | Traffic | Postnatal | Home | NA | Total traffic load of all roads in 100 m buffer at home | numeric | None | Trafficload - nearest | TrafNeares |
| hs_trafnear_h_pow1over3 | hs_trafnear_h_pow1over3 | Outdoor exposures | Traffic | Traffic | Postnatal | Home | NA | Traffic density on nearest road at home | numeric | None | Traffic near | DistRoadH |
| h_bro_preg_Log | h_bro_preg_Log | Outdoor exposures | Water DBPs | Water DBPs | Pregnancy | Home | NA | Total concentration of Brominated during pregnancy | numeric | Natural Logarithm | Brom_THMs | Brom |
| h_clf_preg_Log | h_clf_preg_Log | Outdoor exposures | Water DBPs | Water DBPs | Pregnancy | Home | NA | Total concentration of chloroform during pregnancy | numeric | Natural Logarithm | Chloroform | Chloroform |
| h_thm_preg_Log | h_thm_preg_Log | Outdoor exposures | Water DBPs | Water DBPs | Pregnancy | Home | NA | Total concentration of trihalomethanes during pregnancy | numeric | Natural Logarithm | THMs | THMs |
| h_mbmi_None | h_mbmi_None | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Maternal pre-pregnancy body mass index (kg/m2) | numeric | None | Maternal BMI | mBMI |
| hs_c_height_None | hs_c_height_None | Covariates | Covariates | Child covariate | Postnatal | NA | NA | Height of the child at 6-11 years old (m) | numeric | None | Child height | cHeight |
| hs_c_weight_None | hs_c_weight_None | Covariates | Covariates | Child covariate | Postnatal | NA | NA | Weight of the child at 6-11 years old (kg) | numeric | None | Child weight | cWeight |
| hs_wgtgain_None | hs_wgtgain_None | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Maternal weight gain during pregnancy (kg) | numeric | None | Weight gain Preg | Weightgain |
| e3_gac_None | e3_gac_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Gestational age at birth (week) | numeric | None | Gestational age at birth | GestAge |
| e3_sex_None | e3_sex_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Child sex (female / male) | factor | None | Child sex | Sex |
| e3_yearbir_None | e3_yearbir_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Year of birth (2003 to 2009) | factor | None | Year of birth | YearBirth |
| h_age_None | h_age_None | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Maternal age (years) | numeric | None | Maternal age | mAge |
| h_cohort | h_cohort | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Cohort of inclusion (1 to 6) | factor | None | Cohort | Cohort |
| h_edumc_None | h_edumc_None | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Maternal education (1: primary school, 2:secondary school, 3:university degree or higher) | factor | None | Maternal education | mEducation |
| h_native_None | h_native_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Are the parents native from the country of the cohort (0: no native parent, 1:only one native parent, 2: both parents native) | factor | None | Native | Native |
| h_parity_None | h_parity_None | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Parity before index pregnancy (0: nulliparous, 1:primiparous, 2:multiparous) | factor | None | Parity | Parity |
| hs_child_age_None | hs_child_age_None | Covariates | Covariates | Child covariate | Postnatal | NA | NA | Child age at examination (years) | numeric | None | Child age | cAge |
| e3_bw | e3_bw | Phenotype | Phenotype | Outcome at birth | Pregnancy | NA | NA | Child weight at birth (g) | numeric | None | Birthweight | BW |
| hs_asthma | hs_asthma | Phenotype | Phenotype | Outcome at 6-11 years old | Postnatal | NA | NA | Doctor diagnosed asthma (ever) | factor | None | Asthma | Asthma |
| hs_zbmi_who | hs_zbmi_who | Phenotype | Phenotype | Outcome at 6-11 years old | Postnatal | NA | NA | Body mass index z-score at 6-11 years old - WHO reference - Standardized on sex and age | numeric | None | Body mass index z-score | zBMI |
| hs_correct_raven | hs_correct_raven | Phenotype | Phenotype | Outcome at 6-11 years old | Postnatal | NA | NA | Intelligence quotient at 6-11 years old - Total of correct answers at the RAVEN test | numeric | None | Intelligence quotient | IQ |
| hs_Gen_Tot | hs_Gen_Tot | Phenotype | Phenotype | Outcome at 6-11 years old | Postnatal | NA | NA | Neuro behavior - Internalizing and externalizing problems at 6-11 years old - CBCL scale | numeric | None | Behavior | Behavior |
| hs_bmi_c_cat | hs_bmi_c_cat | Phenotype | Phenotype | Outcome at 6-11 years old | Postnatal | NA | NA | Body mass index categories at 6-11 years old - WHO reference (1: Thinness, 2: Normal, 3:Overweight, 4: Obese) | factor | None | Body mass index (cat) | BMI_cat |
upsetSamples(helix_ma, nintersects = 10)
codebook <- read.table(paste0(work.dir, "/Data/codebook.txt"), sep="\t", header=T)
# Outcome
outcome.Name <- "hs_bmi_c_cat" # "hs_asthma" # "hs_bmi_c_cat" "hs_zbmi_who" "e3_bw"
# Covariates
covariate.Names <- c("h_mbmi_None","e3_sex_None","h_age_None","h_cohort","h_edumc_None")
# Exposure related
exposure.group <- "Organochlorines" # {"Metals", "Organochlorines", "Organophosphate pesticides", "PBDE", "PFAS", "Phenols", "Phthalates", "All"}
if(exposure.group=="All") { exposure.Names <- as.character(codebook$variable_name[codebook$domain=="Chemicals"]) }
if(exposure.group!="All") { exposure.Names <- as.character(codebook$variable_name[codebook$family==exposure.group]) }
exposure.Names <- exposure.Names[grep("madj", exposure.Names)] # select only mother measures of exposure
# Analysis models to run
univariate <- T
ridge <- T
lasso <- T
elasticnet <- T
bayesian.selection <- T
The idea of the exposome was first discussed by Chris Wild in 2005 (1) with the idea of using omic technologies to capture environmental factors influencing human health and disease. The idea is that if factors in the environment do impact our heath there should be molecular signatures that reflect this and that these signatures, in combination with understanding of environmental drivers (e.g. changes is air pollution), can be used to measure both the external reflection of those exposure within the individual and the internal consequence of those exposures. Rappaport and Smith (2) nicely described this motivation, as if “…toxic effects are mediated through chemicals that alter critical molecules, cells, and physiological processes inside the body…, exposures are not restricted to chemicals (toxicants) entering the body from air, water, or food, for example, but also include chemicals produced by inflammation, oxidative stress, lipid peroxidation, infections, gut flora, and other natural processes”. Such chemicals can be measured with modern “metabolomic” techniques and include both targeted and untargeted approaches. The challenge for the resulting analysis is often how to identify the independent associations of each measured and often correlated exposure feature to an outcome of interest, especially in high dimensions.
This lab section provides examples of descriptive statistics to explore the data and implementation of ridge, lasso, elastic net and Bayesian selection. As the exposure features measured often are assumed to indicate long-term effects of the environment preceding the outcome and other omic measures, the analysis is often extended to a mediation type framework.
Figure from (3).
References:
Wild, C.P. (2005). Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 14, 1847-1850.
Rappaport, S.M., and Smith, M.T. (2010). Epidemiology. Environment and disease risks. Science 330, 460-461.
Vermeulen, R., Schymanski, E.L., Barabasi, A.L., and Miller, G.W. (2020). The exposome and health: Where chemistry meets biology. Science 367, 392-396.
Often in assessing multiple exposures we have several questions or
goals interest:
1) what is the independent effect of each exposure? 2) do combinations
of exposures act in a synergistic manner to increase risk? and, 3) what
is the combined effect when an individual is exposed to a mixture of
compounds?
The first goal is often explored via multivariable regression and we provide some example of this analysis below. The second goal can be explored with interaction analyses (covered within this workshop). The third goal often relies on mixture approaches. These approaches are not the focus of this particular workhop, but the SHARP training program does offer a workshop in this area. See https://www.publichealth.columbia.edu/research/precision-prevention/environmental-mixtures-workshop-applications-environmental-health-studies
load(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData"))
variables <- c(covariate.Names, exposure.Names)
d <- wideFormat(intersectColumns(helix_ma[variables, ,]), colDataCols=outcome.Name) # 1) select variables but keep in MultiAssayExperiment format; 2) intersectionColumns selects only individuals with complete data; 3) wideFormat returns as a DataFrame
## harmonizing input:
## removing 11220 sampleMap rows not in names(experiments)
# Create exposure design matrix
X <- as.data.frame(apply(d[,paste("exposome",exposure.Names,sep="_")],2,as.numeric))
names(X) <- exposure.Names
X <- scale(X, center=T, scale=T)
# Create the outcome variable
Y <- d[,outcome.Name] # outcome
if(outcome.Name=="hs_bmi_c_cat") { Y <- ifelse(as.numeric(Y)>=3, 1, 0)}
if(outcome.Name=="e3_bw") { Y <- ifelse(as.numeric(Y)<2500, 1, 0)}
# Create the covariate design matrix
U <- as.data.frame(d[,paste("covariates",covariate.Names,sep="_")])
names(U) <- covariate.Names
U[,c("h_cohort","e3_sex_None","h_edumc_None")] <- lapply(U[,c("h_cohort","e3_sex_None","h_edumc_None")], factor)
U[,c("h_mbmi_None", "h_age_None")] <- lapply(U[,c("h_mbmi_None", "h_age_None")], as.numeric)
U <- model.matrix(as.formula(paste("~-1+", paste(covariate.Names, collapse="+"))), data=U)
# Other variables for analysis
N <- nrow(d) # number of individuals in the analysis
Q <- ncol(U) # number of covariates in the matrix U
P <- ncol(X) # number of exposures in the matrix X
summarytools::view(dfSummary(as.data.frame(X), style = 'grid',
max.distinct.values = 10, plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | hs_dde_madj_Log2 [numeric] |
|
736 distinct values | 0 (0.0%) | |||||
| 2 | hs_ddt_madj_Log2 [numeric] |
|
605 distinct values | 0 (0.0%) | |||||
| 3 | hs_hcb_madj_Log2 [numeric] |
|
739 distinct values | 0 (0.0%) | |||||
| 4 | hs_pcb118_madj_Log2 [numeric] |
|
580 distinct values | 0 (0.0%) | |||||
| 5 | hs_pcb138_madj_Log2 [numeric] |
|
743 distinct values | 0 (0.0%) | |||||
| 6 | hs_pcb153_madj_Log2 [numeric] |
|
745 distinct values | 0 (0.0%) | |||||
| 7 | hs_pcb170_madj_Log2 [numeric] |
|
584 distinct values | 0 (0.0%) | |||||
| 8 | hs_pcb180_madj_Log2 [numeric] |
|
744 distinct values | 0 (0.0%) | |||||
| 9 | hs_sumPCBs5_madj_Log2 [numeric] |
|
592 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.2.2)
2023-01-09
cormat <- cor(X, use="complete.obs")
corrplot(cormat, type="upper", order="hclust",
col=brewer.pal(n=8, name="RdYlBu"),
title = "",
addCoef.col = "black",
tl.cex=.5, number.cex=.5)
# hierarchical clustering
hc <- t(X) %>%
dist(method = "euclidean") %>% # Compute dissimilarity matrix based on Euclidean space
hclust(method = "ward.D2") # Use complete linkage
# Visualize using factoextra
# Cut in groups and color by groups
fviz_dend(hc, k = 3, # Cut in groups
show_labels = TRUE, cex=0.4,
color_labels_by_k = TRUE, # color labels by groups
rect = TRUE # Add rectangle around groups
)
if(univariate) {
univariate.results <- t(sapply(1:P, FUN=function(p) { # using index p facilitate write
x <- X[,p]
reg <- glm(Y~x+U, family=binomial) # perform logistic regression
s.reg <- summary(reg) # get the summary for the regression
c.reg <- s.reg$coef["x",] # select the coefficients for the exposure
write.table(t(c(exposure.Names[p], c.reg)), file="ExposomeUnivariateResults.txt", append=ifelse(p==1, F, T), quote=F, sep="\t", col.names=ifelse(p==1, T, F), row.names=F)
return(c.reg) # to avoid potential memory issues only return coefficients if small number of exposures
}, simplify=T))
univariate.results <- data.frame(exposure.Names,univariate.results)
}
if(univariate) { kable(univariate.results, digits=3, align="c", row.names=FALSE, col.names=c("Exposure","Estimate", "SD","Z statistic", "P-value"))}
| Exposure | Estimate | SD | Z statistic | P-value |
|---|---|---|---|---|
| hs_dde_madj_Log2 | 0.120 | 0.075 | 1.592 | 0.111 |
| hs_ddt_madj_Log2 | 0.016 | 0.075 | 0.208 | 0.835 |
| hs_hcb_madj_Log2 | -0.078 | 0.087 | -0.903 | 0.367 |
| hs_pcb118_madj_Log2 | -0.150 | 0.120 | -1.254 | 0.210 |
| hs_pcb138_madj_Log2 | -0.173 | 0.107 | -1.617 | 0.106 |
| hs_pcb153_madj_Log2 | -0.210 | 0.125 | -1.677 | 0.094 |
| hs_pcb170_madj_Log2 | -0.338 | 0.130 | -2.607 | 0.009 |
| hs_pcb180_madj_Log2 | -0.138 | 0.126 | -1.099 | 0.272 |
| hs_sumPCBs5_madj_Log2 | -0.380 | 0.125 | -3.051 | 0.002 |
neglog.pvalues <- -log10(univariate.results$Pr...z..)
plot(1:nrow(univariate.results), neglog.pvalues,
pch=16, xaxt="n", ylim=c(0, max(neglog.pvalues, 3)),
ylab="-log(p-value)", xlab="")
text(x=1:nrow(univariate.results), y=par("usr")[3]-0.1, xpd=NA,
labels=univariate.results$exposure.Names, adj=.9, srt=45, cex=.75)
abline(h=-log10(0.05/nrow(univariate.results)), lty=2, lwd=2, col=2)
if(ridge) {
ridge.cv <- cv.glmnet(x=X, y=Y, family="binomial", alpha=0) # alpha=0 is for ridge
ridge.coef <- coef(ridge.cv, s = "lambda.min")
ridge.fit <- glmnet(x=X, y=Y, family="binomial", alpha=0)
}
if(ridge) { plot(ridge.cv) }
if(ridge) {
plot(ridge.fit, xvar="lambda", label=T)
abline(v=log(ridge.cv$lambda.min), lty=2, col="red")
abline(v=log(ridge.cv$lambda.1se), lty=2, col="green")
}
if(ridge) { ridge.coef }
## 10 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -0.86512355
## hs_dde_madj_Log2 0.14168571
## hs_ddt_madj_Log2 0.07171705
## hs_hcb_madj_Log2 0.09740094
## hs_pcb118_madj_Log2 -0.03061129
## hs_pcb138_madj_Log2 -0.02064578
## hs_pcb153_madj_Log2 0.01217756
## hs_pcb170_madj_Log2 -0.07540445
## hs_pcb180_madj_Log2 0.03147135
## hs_sumPCBs5_madj_Log2 -0.10445286
if(lasso) {
lasso.cv <- cv.glmnet(x=X, y=Y, family="binomial", alpha=1) # alpha=1 is for lasso
lasso.coef <- coef(lasso.cv, s = "lambda.min")
lasso.fit <- glmnet(x=X, y=Y, family="binomial", alpha=1)
}
if(lasso) { plot(lasso.cv) }
if(lasso) {
plot(lasso.fit, xvar="lambda", label=T)
abline(v=log(lasso.cv$lambda.min), lty=2, col="red")
abline(v=log(lasso.cv$lambda.1se), lty=2, col="green")
}
if(lasso) { lasso.coef }
## 10 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -0.86857898
## hs_dde_madj_Log2 0.17290757
## hs_ddt_madj_Log2 0.05854869
## hs_hcb_madj_Log2 0.11836118
## hs_pcb118_madj_Log2 .
## hs_pcb138_madj_Log2 .
## hs_pcb153_madj_Log2 .
## hs_pcb170_madj_Log2 -0.04365484
## hs_pcb180_madj_Log2 .
## hs_sumPCBs5_madj_Log2 -0.15671124
if(elasticnet) {
elasticnet.cv <- cv.glmnet(x=X, y=Y, family="binomial", alpha=0.5) # alpha=0.5 is for elastic net
elasticnet.coef <- coef(elasticnet.cv, s = "lambda.min")
elasticnet.fit <- glmnet(x=X, y=Y, family="binomial", alpha=0.5)
}
if(elasticnet) { plot(elasticnet.cv) }
if(elasticnet) {
plot(elasticnet.fit, xvar="lambda", label=T)
abline(v=log(elasticnet.cv$lambda.min), lty=2, col="red")
abline(v=log(elasticnet.cv$lambda.1se), lty=2, col="green")
}
if(elasticnet) { elasticnet.coef }
## 10 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -0.86592174
## hs_dde_madj_Log2 0.16260223
## hs_ddt_madj_Log2 0.05398904
## hs_hcb_madj_Log2 0.10562583
## hs_pcb118_madj_Log2 .
## hs_pcb138_madj_Log2 .
## hs_pcb153_madj_Log2 .
## hs_pcb170_madj_Log2 -0.04773519
## hs_pcb180_madj_Log2 .
## hs_sumPCBs5_madj_Log2 -0.13862298
if(bayesian.selection) {
U <- U[,2:ncol(U)]
reg.bas <- bas.glm(Y~X+U, family = binomial(link = "logit"),
betaprior = bic.prior(), modelprior=beta.binomial(1,P),
include.always = ~U)
coef.bas <- coef(reg.bas, estimator="BMA")
coef.r <- data.frame(c("Intercept", exposure.Names, names(as.data.frame(U))), coef.bas$postmean, coef.bas$postsd,coef.bas$probne0)
names(coef.r) <- c("Variable", "Estimate", "Standard Deviation", "Pr(B!=0")
}
if(bayesian.selection) {
plot(reg.bas, which=c(4))
}
if(bayesian.selection) {
kable(coef.r, digits=3, align="c", row.names=FALSE)
}
| Variable | Estimate | Standard Deviation | Pr(B!=0 |
|---|---|---|---|
| Intercept | -2.314 | 0.594 | 1.000 |
| hs_dde_madj_Log2 | 0.002 | 0.020 | 0.018 |
| hs_ddt_madj_Log2 | 0.000 | 0.005 | 0.004 |
| hs_hcb_madj_Log2 | 0.000 | 0.004 | 0.002 |
| hs_pcb118_madj_Log2 | 0.000 | 0.013 | 0.007 |
| hs_pcb138_madj_Log2 | -0.004 | 0.035 | 0.019 |
| hs_pcb153_madj_Log2 | -0.006 | 0.045 | 0.021 |
| hs_pcb170_madj_Log2 | -0.085 | 0.178 | 0.204 |
| hs_pcb180_madj_Log2 | -0.014 | 0.071 | 0.046 |
| hs_sumPCBs5_madj_Log2 | -0.163 | 0.220 | 0.384 |
| e3_sex_Nonefemale | 0.000 | 0.135 | 1.000 |
| e3_sex_Nonemale | 0.000 | 0.000 | 1.000 |
| h_age_None | 0.033 | 0.017 | 1.000 |
| h_cohort2 | 1.089 | 0.546 | 1.000 |
| h_cohort3 | 0.829 | 0.276 | 1.000 |
| h_cohort4 | 0.498 | 0.265 | 1.000 |
| h_cohort5 | 0.126 | 0.354 | 1.000 |
| h_cohort6 | 0.894 | 0.283 | 1.000 |
| h_edumc_None2 | 0.074 | 0.228 | 1.000 |
| h_edumc_None3 | -0.319 | 0.229 | 1.000 |
Genomewide association studies have been extremely successful in identifying single nucleotide polymorphisms (SNPs) associated with traits and disease outcomes. By far, the single most prominent analysis technique for GWAS is to treat each SNP as independent and perform a genomewide scan with numerous univariate regression models. This brief tutorial performs this analysis and some summary results on a subset of SNPs simulated to accompanying the ISGlobal Exposome Data Challege dataset.
This is not a comprehensive example of a GWAS analysis and is designed to provide insight into the genomic data and provides a foundation for further analyses. Current techniques leveraging germline genetics include GxE analyses, polygenic risk scores, and the use of summary statistics for Mendelian randomization studies and TWAS (and related) studies that often leverage additional omic data.
# Outcome
outcome.Name <- "hs_bmi_c_cat" # "hs_asthma" # "hs_bmi_c_cat" "hs_zbmi_who"
# Covariates
covariate.Names <- c("h_mbmi_None","e3_sex_None","h_age_None","h_cohort","h_edumc_None","ethn_PC1","ethn_PC2")
# SNPs
snp.Names <- paste("SNP", 1:1000, sep=".")
# Analysis models to run
univariate <- T
load(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData")) # not recommended way of storing genomewide data
variables <- c(covariate.Names, "h_ethnicity_cauc", snp.Names)
d <- wideFormat(intersectColumns(helix_ma[variables, ,]), colDataCols=outcome.Name) # 1) select variables but keep in MultiAssayExperiment format; 2) intersectionColumns selects only individuals with complete data; 3) wideFormat returns as a DataFrame
## harmonizing input:
## removing 7854 sampleMap rows not in names(experiments)
# Create design matrix
X <- d[,paste0("genome_", snp.Names)]
names(X) <- snp.Names
X <- as.matrix(X)
# Create the outcome variable
Y <- d[,outcome.Name] # outcome
if(outcome.Name=="hs_bmi_c_cat") { Y <- ifelse(as.numeric(Y)>=3, 1, 0)}
# Create the covariate design matrix
U <- d[,c(paste0("covariates_", covariate.Names[1:5]), paste0("proteome.cov_", covariate.Names[6:7]))]
names(U) <- covariate.Names
U[,c("h_cohort","e3_sex_None","h_edumc_None")] <- lapply(U[,c("h_cohort","e3_sex_None","h_edumc_None")], factor)
U[,c("h_mbmi_None", "h_age_None","ethn_PC1","ethn_PC2")] <- lapply(U[,c("h_mbmi_None", "h_age_None","ethn_PC1","ethn_PC2")], as.numeric)
U <- model.matrix(as.formula(paste("~-1+", paste(covariate.Names, collapse="+"))), data=U)
# Other variables for analysis
N <- nrow(d) # number of individuals in the analysis
Q <- ncol(U) # number of covariates in the matrix U
P <- ncol(X) # number of SNPs in the matrix X
plot(d$proteome.cov_ethn_PC1, d$proteome.cov_ethn_PC2, pch=16, col=ifelse(d$proteome.cov_h_ethnicity_cauc=="yes", 1, 2),
xlab="Component 1", ylab="Component 2")
legend(x="topleft", legend=c("Caucasian", "Other"), col=c(1,2), pch=16)
cormat <- round(cor(X[,1:(P/5)], use="complete.obs"), 2)
cormat[lower.tri(cormat)]<- NA
melted_cormat <- melt(cormat)
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_blank(), axis.text.y = element_blank())+
labs(y= "SNPs", x = "SNPs")+
coord_fixed()
if(univariate) {
univariate.results <- t(sapply(1:P, FUN=function(p) { # using index p facilitate write
x <- X[,p]
reg <- glm(Y~x+U, family=binomial) # perform logistic regression
s.reg <- summary(reg) # get the summary for the regression
c.reg <- s.reg$coef["x",] # select the coefficients for the exposure
write.table(t(c(snp.Names[p], c.reg)), file="GenomeUnivariateResults.txt", append=ifelse(p==1, F, T), quote=F, sep="\t", col.names=ifelse(p==1, T, F), row.names=F)
return(c.reg) # to avoid potential memory issues only return coefficients if small number of exposures
}, simplify=T))
univariate.results <- data.frame(snp.Names,univariate.results)
names(univariate.results) <- c("SNP.Name","Estimate", "SD","Z.statistic", "P.value")
univariate.results$P.value <- format(univariate.results$P.value, scientific=T)
}
if(univariate) { kable(univariate.results[as.numeric(univariate.results$P.value)<0.05,], digits=3, align="c", row.names=FALSE, col.names=c("SNP","Estimate", "SD","Z Statistics", "P-value"))}
| SNP | Estimate | SD | Z Statistics | P-value |
|---|---|---|---|---|
| SNP.9 | 0.329 | 0.137 | 2.410 | 1.594410e-02 |
| SNP.10 | 0.321 | 0.112 | 2.864 | 4.179853e-03 |
| SNP.14 | 0.230 | 0.113 | 2.029 | 4.244698e-02 |
| SNP.15 | 0.341 | 0.152 | 2.248 | 2.456820e-02 |
| SNP.16 | 0.278 | 0.098 | 2.825 | 4.724717e-03 |
| SNP.17 | 0.238 | 0.098 | 2.433 | 1.496590e-02 |
| SNP.20 | 0.219 | 0.096 | 2.278 | 2.270211e-02 |
| SNP.21 | 0.193 | 0.097 | 1.992 | 4.639044e-02 |
| SNP.44 | 0.308 | 0.125 | 2.463 | 1.377572e-02 |
| SNP.46 | 0.226 | 0.111 | 2.040 | 4.130533e-02 |
| SNP.52 | 0.449 | 0.165 | 2.720 | 6.522377e-03 |
| SNP.62 | 0.211 | 0.104 | 2.038 | 4.153780e-02 |
| SNP.100 | 0.278 | 0.075 | 3.697 | 2.182341e-04 |
| SNP.125 | -0.256 | 0.123 | -2.071 | 3.831718e-02 |
| SNP.132 | -0.207 | 0.103 | -2.002 | 4.529343e-02 |
| SNP.183 | 0.417 | 0.079 | 5.278 | 1.302509e-07 |
| SNP.228 | 0.281 | 0.100 | 2.807 | 5.003849e-03 |
| SNP.259 | 0.328 | 0.161 | 2.033 | 4.200655e-02 |
| SNP.278 | 0.370 | 0.076 | 4.874 | 1.095336e-06 |
| SNP.285 | 0.242 | 0.101 | 2.405 | 1.615891e-02 |
| SNP.297 | 0.524 | 0.081 | 6.484 | 8.937938e-11 |
| SNP.365 | -0.373 | 0.189 | -1.974 | 4.837617e-02 |
| SNP.436 | -0.325 | 0.133 | -2.455 | 1.409099e-02 |
| SNP.502 | 0.345 | 0.077 | 4.475 | 7.647719e-06 |
| SNP.532 | 0.257 | 0.119 | 2.163 | 3.057012e-02 |
| SNP.573 | 0.950 | 0.099 | 9.585 | 9.253401e-22 |
| SNP.575 | 0.292 | 0.124 | 2.356 | 1.848794e-02 |
| SNP.602 | -0.235 | 0.101 | -2.319 | 2.037771e-02 |
| SNP.626 | 0.367 | 0.142 | 2.591 | 9.577285e-03 |
| SNP.632 | 0.206 | 0.101 | 2.030 | 4.235610e-02 |
| SNP.645 | 0.239 | 0.109 | 2.184 | 2.894422e-02 |
| SNP.651 | 0.496 | 0.087 | 5.677 | 1.373821e-08 |
| SNP.655 | -0.331 | 0.160 | -2.074 | 3.807018e-02 |
| SNP.691 | -0.267 | 0.126 | -2.118 | 3.420752e-02 |
| SNP.703 | -0.243 | 0.101 | -2.409 | 1.598263e-02 |
| SNP.743 | 0.194 | 0.098 | 1.968 | 4.908090e-02 |
| SNP.749 | 0.319 | 0.119 | 2.684 | 7.283509e-03 |
| SNP.750 | 0.228 | 0.096 | 2.363 | 1.812390e-02 |
| SNP.757 | -0.192 | 0.098 | -1.964 | 4.953584e-02 |
| SNP.761 | -0.371 | 0.171 | -2.177 | 2.951590e-02 |
| SNP.784 | 0.219 | 0.100 | 2.179 | 2.930437e-02 |
| SNP.787 | 0.213 | 0.102 | 2.082 | 3.731017e-02 |
| SNP.802 | 0.509 | 0.195 | 2.610 | 9.054696e-03 |
| SNP.816 | 0.249 | 0.107 | 2.325 | 2.006790e-02 |
| SNP.830 | 0.312 | 0.158 | 1.975 | 4.824190e-02 |
| SNP.874 | -0.490 | 0.173 | -2.825 | 4.730599e-03 |
| SNP.893 | -0.212 | 0.099 | -2.146 | 3.189596e-02 |
| SNP.895 | -0.227 | 0.105 | -2.161 | 3.068424e-02 |
| SNP.896 | -0.277 | 0.124 | -2.231 | 2.569164e-02 |
| SNP.900 | 0.201 | 0.076 | 2.635 | 8.408207e-03 |
| SNP.942 | 0.330 | 0.081 | 4.073 | 4.633300e-05 |
| SNP.943 | 0.228 | 0.093 | 2.457 | 1.399408e-02 |
| SNP.944 | 0.356 | 0.122 | 2.915 | 3.559579e-03 |
| SNP.945 | 0.236 | 0.107 | 2.200 | 2.782052e-02 |
| SNP.947 | 0.247 | 0.104 | 2.361 | 1.824135e-02 |
| SNP.961 | -0.205 | 0.103 | -1.999 | 4.565292e-02 |
| SNP.982 | -0.315 | 0.130 | -2.428 | 1.518755e-02 |
| SNP.983 | -0.352 | 0.177 | -1.985 | 4.709611e-02 |
neglog.pvalues <- -log10(as.numeric(univariate.results$P.value))
plot(1:nrow(univariate.results), neglog.pvalues,
pch=16, xaxt="n", ylim=c(0, max(neglog.pvalues, 3)),
ylab="-log(p-value)", xlab="SNPs")
abline(h=-log10(0.05/nrow(univariate.results)), lty=2, lwd=2, col=2)
pvalues <- as.numeric(univariate.results$P.value)
r <- gcontrol2(pvalues, pch=16)
lambda <- round(r$lambda,3)
text(x=1, y=5, labels=bquote(lambda == .(lambda)), cex=2)
When there are two or more high dimensional layers of omic data there are several ways to approach the analysis.
In this example, we investigate approach #1 in which all pairwise regressions are performed between two omic layers. We then explore results visually (#1.1 above) and through data-driven approaches (#1.3 above). We note that in this discussion and example, we do not have a specific and defined outcome of interest that will are ultimately interested in exploring. The link of multi-omic data to a specific outcome is what we are primarily exploring in the workshop lectures and labs.
codebook <- read.table(paste0(work.dir, "/Data/codebook.txt"), sep="\t", header=T)
# Covariates
covariate.Names <- c("e3_sex_None","h_cohort", "age_sample_years","ethn_PC1","ethn_PC2","hs_dift_mealblood_imp","blood_sam4")
# Exposure related
exposure.group <- "Organochlorines" #Organochlorines" # {"Metals", "Organochlorines", "Organophosphate pesticides", "PBDE", "PFAS", "Phenols", "Phthalates", "All"}
if(exposure.group=="All") { exposure.Names <- as.character(codebook$variable_name[codebook$domain=="Chemicals"]) }
if(exposure.group!="All") { exposure.Names <- as.character(codebook$variable_name[codebook$family==exposure.group]) }
exposure.Names <- exposure.Names[grep("madj", exposure.Names)] # select only children measures
# Proteome
proteome.Names <- c("Adiponectin","CRP","APO.A1","APO.B","APO.E","IL1beta","IL6","MCP1","Leptin","HGF","INSULIN","TNFalfa","BAFF","Cpeptide","PAI1","IL8","FGFBasic","GCSF","IL10","IL13","IL12","Eotaxin","IL17","MIP1alfa","MIP1beta","IL15","EGF","IL5","IFNgamma","IFNalfa","IL1RA","IL2","IP10","IL2R","MIG","IL4")
# Analysis models to run
univariate <- T
load(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData"))
variables <- c(covariate.Names, exposure.Names, proteome.Names)
d <- wideFormat(intersectColumns(helix_ma[variables, ,])) # 1) select variables but keep in MultiAssayExperiment format; 2) intersectionColumns selects only individuals with complete data; 3) wideFormat returns as a DataFrame
## harmonizing input:
## removing 6732 sampleMap rows not in names(experiments)
# Create design matrix
X <- as.data.frame(d[,paste("proteome",proteome.Names,sep="_")])
names(X) <- proteome.Names
X <- scale(X, center=T, scale=T)
# Create exposure design matrix
W <- as.data.frame(apply(d[,paste("exposome",exposure.Names,sep="_")],2,as.numeric))
names(W) <- exposure.Names
W <- scale(W, center=T, scale=T)
# Create the covariate design matrix
U <- d[,c(paste0("covariates_", covariate.Names[1:2]), paste0("metabol_urine.cov_", covariate.Names[3:7]))]
names(U) <- covariate.Names
U[,c("h_cohort","e3_sex_None")] <- lapply(U[,c("h_cohort","e3_sex_None")], factor)
U[,c("age_sample_years","ethn_PC1","ethn_PC2","hs_dift_mealblood_imp","blood_sam4")] <- lapply(U[,c("age_sample_years","ethn_PC1","ethn_PC2","hs_dift_mealblood_imp","blood_sam4")], as.numeric)
U <- model.matrix(as.formula(paste("~-1+", paste(covariate.Names, collapse="+"))), data=U)
# Other variables for analysis
N <- nrow(d) # number of individuals in the analysis
Q <- ncol(U) # number of covariates in the matrix U
P <- ncol(X) # number of proteome features in the matrix X
R <- ncol(W) # number of exposome features in the matrix X
if(univariate) {
univariate.results <- {}
beta.results <- matrix(0, nrow=R, ncol=P)
p.results <- matrix(0, nrow=R, ncol=P)
for(r in 1:R) { # loop through exposures
w <- W[,r]
for(p in 1:P) { # loop through proteins
x <- X[,p]
reg <- glm(x~w+U, family=gaussian)
s.reg <- summary(reg) # get the summary for the regression
c.reg <- s.reg$coef["w",] # select the coefficients for the exposure
r.reg <- c(exposure.Names[r], proteome.Names[p], c.reg)
write.table(t(r.reg), file="ExposomeProteomeUnivariateResults.txt", append=ifelse(p*r==1, F, T), quote=F, sep="\t", col.names=ifelse(p*r==1, T, F), row.names=F)
beta.results[r,p] <- as.numeric(r.reg["Estimate"])
p.results[r,p] <- as.numeric(r.reg["Pr(>|t|)"])
univariate.results <- rbind(univariate.results, r.reg)
}
}
univariate.results <- as.data.frame(univariate.results)
names(univariate.results) <- c("Exposure", "Proteome", names(univariate.results)[3:6])
beta.results <- as.data.frame(beta.results)
p.results <- as.data.frame(p.results)
names(beta.results) <- proteome.Names
names(p.results) <- proteome.Names
row.names(beta.results) <- exposure.Names
row.names(p.results) <- exposure.Names
}
beta.results.long <- melt(as.matrix(beta.results))
names(beta.results.long) <- c("Exposure", "Protein", "Effect")
beta.pca <- prcomp(beta.results, scale = TRUE)
if(univariate) { kable(univariate.results[univariate.results[,"Pr(>|t|)"] <0.05,], digits=3, align="c", row.names=FALSE, col.names=c("Exposure", "Protein","Estimate", "SD","t Value", "P Value"))}
| Exposure | Protein | Estimate | SD | t Value | P Value |
|---|---|---|---|---|---|
| hs_ddt_madj_Log2 | PAI1 | 0.0655980263911758 | 0.0222916816296171 | 2.94271322734222 | 0.00332119390441221 |
| hs_ddt_madj_Log2 | FGFBasic | -0.0591217726983932 | 0.0288577947729325 | -2.04872801832547 | 0.040723688453088 |
| hs_hcb_madj_Log2 | APO.A1 | 0.0808454866534083 | 0.0371193241724781 | 2.17798918638047 | 0.0296169818286333 |
| hs_hcb_madj_Log2 | FGFBasic | -0.0681603743455859 | 0.0340517970716978 | -2.00166746565742 | 0.0455637497746824 |
| hs_hcb_madj_Log2 | IL10 | -0.111039202752254 | 0.0367216519466496 | -3.02380739607181 | 0.0025535488315567 |
| hs_pcb138_madj_Log2 | IL1beta | -0.0807185736973553 | 0.0393129713954478 | -2.05323003660573 | 0.0402844267333609 |
| hs_pcb170_madj_Log2 | CRP | -0.131402752930642 | 0.050864664804739 | -2.58337990499054 | 0.00991083721192663 |
| hs_pcb170_madj_Log2 | IL1beta | -0.131460818837491 | 0.0481399976222637 | -2.73080235418818 | 0.00641819810301397 |
| hs_pcb170_madj_Log2 | IL6 | -0.107979982767619 | 0.0487130259305386 | -2.21665521089966 | 0.0268489486073114 |
| hs_pcb170_madj_Log2 | MIP1beta | 0.0893370838758477 | 0.0426613014618386 | 2.09410123026278 | 0.0364775260138151 |
| hs_pcb180_madj_Log2 | APO.A1 | 0.110222789580892 | 0.0465039907207924 | 2.37017915822888 | 0.0179497811573896 |
| hs_sumPCBs5_madj_Log2 | CRP | -0.129142690409208 | 0.0506317312054623 | -2.55062758737501 | 0.0108863794381994 |
| hs_sumPCBs5_madj_Log2 | IL1beta | -0.0998494637762745 | 0.0479831741840394 | -2.08092660550763 | 0.0376696679065281 |
| hs_sumPCBs5_madj_Log2 | IL6 | -0.101482269912699 | 0.0484979388201711 | -2.09250686485858 | 0.0366200626606123 |
neglog.pvalues <- -log10(as.numeric(univariate.results[,"Pr(>|t|)"]))
plot(1:nrow(univariate.results), neglog.pvalues,
pch=16, xaxt="n", ylim=c(0, max(neglog.pvalues, 3)),
ylab="-log(p-value)", xlab="",
col=match(univariate.results$Exposure, exposure.Names))
abline(h=-log10(0.05/nrow(univariate.results)), lty=2, lwd=2, col=2)
axis(side=1, at=(1:R)*(P)-P*.5, labels=FALSE)
text(x=(1:R)*(P), y=par("usr")[3]-0.1, xpd=NA,
labels=exposure.Names, adj=1.2, srt=45, cex=.6)
ggplot(beta.results.long,
aes(fill=Exposure, y = Effect, x = Protein)) +
geom_bar(position="dodge", stat="identity") +
ggtitle("Title") +
facet_wrap(~Protein) +
facet_grid(rows = vars(Exposure)) +
xlab("") +
ylab("Effect") +
theme(text = element_text(size=1),
axis.text.x = element_text(angle = 45, vjust = 1,
hjust = 1, size=7),
axis.text.y = element_text(size=10),
legend.title = element_blank(),
legend.text = element_text(size=10))
Here, we examine the pairwise results (i.e. the effect estimates of \(\beta\)s from the regression of each protein on each exposure) by treating the estimates as the data and preforming hierarchical clustering and principal component analysis, as examples.
heatmap.2(x=as.matrix(beta.results), hclustfun=function(d) hclust(d, method = "ward.D2"), trace="none", cexRow =.5, cexCol = .5)
fviz_eig(beta.pca)
fviz_pca_var(beta.pca,
title="PCA by Protein Contribution",
col.var = "contrib", # Color by proportional amount to the PC
gradient.cols = c("green", "blue", "red"),
repel = TRUE # Avoid text overlapping
)
fviz_pca_ind(beta.pca,
title="PCA by Exposure Contribution",
col.ind = "cos2", # Color by total PC amount for each "individual"
gradient.cols = c("green", "blue", "red"),
repel = TRUE # Avoid text overlapping
)